Conversation
WalkthroughThe recent update introduces comprehensive guides for utilizing two distinct Text To Speech APIs: the Live Text To Speech API and the Pre Recorded Text To Speech API. These additions cover everything from connecting to a WebSocket for live synthesis to selecting voices and languages for pre-recorded audio synthesis. This update enhances the flexibility and customization options available for users seeking high-quality, personalized audio outputs. Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
| 1. **Pre-recorded Speaker Voices**: | ||
| This mode uses a library of pre-recorded voices. | ||
| You can select a voice that suits your requirements and provide the text you wish to synthesize. | ||
| The API will then generate audio using the selected voice. | ||
| The available voices include but are not limited to Claribel Dervla, Daisy Studious, Gracie Wise, and many more. | ||
| Each voice has its unique tone and style, providing a range of options for your audio content. | ||
|
|
||
| 2. **Cloned Speaker Voice**: | ||
| If you require a more personalized voice, this mode allows you to clone a specific voice from an audio file you provide. | ||
| This is particularly useful for creating a unique voice for your brand or for specific characters in storytelling applications. |
There was a problem hiding this comment.
The "Modes of Operation" section clearly differentiates between using pre-recorded speaker voices and cloning a speaker voice. For the "Cloned Speaker Voice" mode, it would be helpful to include more details on how to provide the audio file for cloning, such as acceptable formats and how to encode or reference the file in the request.
|
|
||
| ### Request Format | ||
|
|
||
| To use the API, send a POST request to `/text/audio/audio-synthesis/` with a JSON payload specifying the `speaker_voice_behaviour` (either `"pre recorded voice"` or `"cloned voice"`), the chosen `pre_recorded_speaker_voice` or `cloned_speaker_voice` file (if applicable), the `text` to be synthesized, and the `language`. |
There was a problem hiding this comment.
In the "Request Format" section, the documentation provides a concise overview of the required JSON payload structure. To improve clarity, consider adding a brief description for each field in the payload, especially for fields like speaker_voice_behaviour, to explain the expected values and their effects.
|
|
||
| ### Response | ||
|
|
||
| The API responds with an audio file at 24,000 Hz in WAV/PCM format, allowing for high-quality audio output suitable for various applications, from virtual assistants to audio books. |
There was a problem hiding this comment.
The "Response" section clearly states the format and quality of the audio file returned by the API. For completeness, consider mentioning any potential error responses and their meanings to help users troubleshoot issues with their requests.
| Here is an example of payload for a pre recorded voice: | ||
| POST https://api.gladia.io/text/audio/audio-synthesis/ | ||
| ```json | ||
| { | ||
| "speaker_voice_behaviour": "pre recorded voice", | ||
| "pre_recorded_speaker_voice": "Claribel Dervla", | ||
| "text": "Hello, welcome to our Text to Speech API. This is an example using a pre-recorded speaker voice.", | ||
| "language": "en" | ||
| } | ||
| Here is an example of payload for a cloned speaker voice voice: | ||
| POST https://api.gladia.io/text/audio/audio-synthesis/ | ||
| ```json | ||
| { | ||
| "speaker_voice_behaviour": "cloned voice", | ||
| "cloned_speaker_voice": "file_path_to_cloned_voice_sample.wav", | ||
| "text": "This is an example using a cloned speaker voice.", | ||
| "language": "en" | ||
| } |
There was a problem hiding this comment.
The "Example" section provides useful payload examples for both pre-recorded and cloned speaker voices. To enhance this section, consider adding example responses, including both successful audio file returns and examples of error responses. This would provide a more comprehensive guide for users to understand the API's behavior.
|
|
||
| ### Supported Languages | ||
|
|
||
| The API supports a variety of languages, including english (en), spanish (es), french (fr), german (de), italian (it), portuguese (pt), polish (pl), turkish (tr), russian (ru), dutch (nl), czech (cs), arabic (ar), chinese (zh-cn), hungarian (hu), korean (ko), japanese (ja), and hindi (hi). This wide range of supported languages makes it easy to create audio content for a global audience. |
There was a problem hiding this comment.
The "Supported Languages" section is informative and mirrors the content in the live-text-to-speech.mdx file. Consistency between documents is good, but ensure that any updates to supported languages are reflected across all relevant documentation to maintain accuracy.
| The Live Text to Speech API offers a diverse range of speaker voices to choose from. Here is a list of available voices: | ||
|
|
||
| - Gitta Nikolina | ||
| - Henriette Usha | ||
| - Sofia Hellen | ||
| - Tammy Grit | ||
| - Tanja Adelina | ||
| - Vjollca Johnnie | ||
| - Andrew Chipper | ||
| - Badr Odhiambo | ||
| - Dionisio Schuyler | ||
| - Royston Min | ||
| - Viktor Eka | ||
| - Abrahan Mack | ||
| - Adde Michal | ||
| - Baldur Sanjin | ||
| - Craig Gutsy | ||
| - Damien Black | ||
| - Gilberto Mathias | ||
| - Ilkin Urbano | ||
| - Kazuhiko Atallah | ||
| - Ludvig Milivoj | ||
| - Suad Qasim | ||
| - Torcull Diarmuid | ||
| - Viktor Menelaos | ||
| - Zacharie Aimilios | ||
| - Nova Hogarth | ||
| - Maja Ruoho | ||
| - Uta Obando | ||
| - Lidiya Szekeres | ||
| - Chandra MacFarland | ||
| - Szofi Granger | ||
| - Camilla Holmström | ||
| - Lilya Stainthorpe | ||
| - Zofija Kendrick | ||
| - Narelle Moon | ||
| - Barbora MacLean | ||
| - Alexandra Hisakawa | ||
| - Alma María | ||
| - Rosemary Okafor | ||
| - Ige Behringer | ||
| - Filip Traverse | ||
| - Damjan Chapman | ||
| - Wulf Carlevaro | ||
| - Aaron Dreschner | ||
| - Kumar Dahl | ||
| - Eugenio Mataracı | ||
| - Ferran Simen | ||
| - Xavier Hayasaka | ||
| - Luis Moray | ||
| - Marcos Rudaski No newline at end of file |
There was a problem hiding this comment.
The "Available Speaker Voices" section, similar to the one in the live-text-to-speech.mdx file, lists the speaker voices. It's crucial to keep this list updated and consistent across all documentation. As previously mentioned, categorizing the voices could significantly improve user experience.
There was a problem hiding this comment.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files ignored due to path filters (1)
mint.jsonis excluded by:!**/*.json
Files selected for processing (2)
- chapters/text-to-speech-api/pages/live-text-to-speech.mdx (1 hunks)
- chapters/text-to-speech-api/pages/text-to-speech.mdx (1 hunks)
Files skipped from review as they are similar to previous changes (2)
- chapters/text-to-speech-api/pages/live-text-to-speech.mdx
- chapters/text-to-speech-api/pages/text-to-speech.mdx
There was a problem hiding this comment.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- chapters/text-to-speech-api/pages/live-text-to-speech.mdx (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- chapters/text-to-speech-api/pages/live-text-to-speech.mdx
Summary by CodeRabbit